Fast Parameter Tuning with Approximations at Scale

نویسندگان

  • Chengjie Qin
  • Florin Rusu
چکیده

Parameter tuning is a fundamental problem that has to be handled by any Big Data analytics system. Identifying the optimal model parameters is an interactive, human-in-the-loop process that requires many hours – if not days and months – even for experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient parameter tuning. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently and efficiently. Online aggregation is applied to identify sub-optimal configurations and halt corresponding configurations early in the processing. We apply the proposed techniques to distributed gradient descent optimization – batch and stochastic – for support vector machines and logistic regression models. We evaluate their performance over terascale-size synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a 1/20th fraction of the time. Gradient Descent. Consider the following optimization problem with a linearly separable objective function: minw∈Rd ∑N i=1 f (w, zi) in which a d-dimensional vector w ∈ R, d ≥ 1 has to be found such that the objective function is minimized. The constants zi, 1 ≤ i ≤ N correspond to tuples in a database table. Essentially, each term in the objective function corresponding to a tuple zi can be viewed as a separate function fi(w) = f (w, zi). Gradient descent is an iterative method. The main idea is to start from an arbitrary vector w and then to determine iteratively new vectors w such that the objective function at each iteration decreases, i.e., f(w) > f(w). w is determined by moving in the opposite direction to the gradient or subgradient of function f . Formally, this can be written as: w = w − αk∇f ( w ) where αk ≥ 0 is the step size. When the gradient ∇f(w) is computed using all the tuples, the method is called batch gradient desecent (BGD). One variant of gradient descent is stochastic gradient descent (SGD) where ∇f(w) is approximated with the gradient of a single term. In SGD typically, αk → 0 as k → ∞ and the order of tuples has to be randomized for every iteration. Speculative Iterations. We address two fundamental problems of gradient descent methods— convergence detection and parameter tuning. As with any iterative method, gradient descent convergence is achieved when there is no more decrease in the objective function, i.e., the loss, across consecutive iterations. While it is obvious that convergence detection requires loss evaluation at every iteration, the standard practice, e.g., Vowpal Wabbit, MLLib, is to discard detection altogether and execute the algorithm for a fixed number of iterations. The reason is simple: loss computation requires a complete pass over the data, which doubles the execution time. This approach suffers from at least two problems. First, it is impossible to detect convergence before the specified number of iterations finishes. And second, it is impossible to identify bad parameter configurations, i.e., configurations that do not lead to model convergence. Recall that both BGD and SGD depend on a series of parameters, the most important of which is the step size. Finding the optimal step size typically requires many trials. Discarding loss computation increases both the number of trials as well as the duration of each trial.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A FAST FUZZY-TUNED MULTI-OBJECTIVE OPTIMIZATION FOR SIZING PROBLEMS

The most recent approaches of multi-objective optimization constitute application of meta-heuristic algorithms for which, parameter tuning is still a challenge. The present work hybridizes swarm intelligence with fuzzy operators to extend crisp values of the main control parameters into especial fuzzy sets that are constructed based on a number of prescribed facts. Such parameter-less particle ...

متن کامل

Efficient and Robust Parameter Tuning for Heuristic Algorithms

The main advantage of heuristic or metaheuristic algorithms compared to exact optimization methods is their ability in handling large-scale instances within a reasonable time, albeit at the expense of losing a guarantee for achieving the optimal solution. Therefore, metaheuristic techniques are appropriate choices for solving NP-hard problems to near optimality. Since the parameters of heuristi...

متن کامل

New Maximum Power Point Tracking Technique Based on P&O Method

In the most described maximum power point tracking (MPPT) methods in the literatures, the optimal operation point of the photovoltaic (PV) systems is estimated by linear approximations. However, these approximations can lead to less optimal operating conditions and significantly reduce the performances of the PV systems. This paper proposes a new approach to determine the maximum power point (M...

متن کامل

Asymptotic Approximations of the Solution for a Traveling String under Boundary Damping

Transversal vibrations of an axially moving string under boundary damping are investigated. Mathematically, it represents a homogenous linear partial differential equation subject to nonhomogeneous boundary conditions. The string is moving with a relatively (low) constant speed, which is considered to be positive.  The string is kept fixed at the first end, while the other end is tied with the ...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

Tuning Shape Parameter of Radial Basis Functions in Zooming Images using Genetic Algorithm

Image zooming is one of the current issues of image processing where maintaining the quality and structure of the zoomed image is important. To zoom an image, it is necessary that the extra pixels be placed in the data of the image. Adding the data to the image must be consistent with the texture in the image and not to create artificial blocks. In this study, the required pixels are estimated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014